Queuing Network Models to Predict the Completion Time of the Map Phase of MapReduce Jobs

نویسندگان

  • Daniel A. Menascé
  • Shouvik Bardhan
چکیده

Big Data processing is generally defined as a situation when the size of the data itself becomes part of the computational problem. This has made divide-and-conquer type algorithms implemented in clusters of multi-core CPUs in Hadoop/MapReduce environments an important data processing tool for many organizations. Jobs of various kinds, which consists of a number of automatically parallelized tasks, are scheduled on distributed nodes based on the capacity of the machines. A key challenge in provisioning such jobs in a Hadoop/MapReduce cluster is to be able to predict their completion times based on various job characteristics. Standard makespan computations that ignore the contention on compute nodes significantly underestimate a job’s completion time. This paper proposes a mathematically sound model based on closed Queuing Networks for predicting the execution time of the map phase of a MapReduce job. The model captures contention at compute nodes and parallelism gains due to increased number of slots available to map tasks. Experiments validated the model on a single as well as a 2-node Hadoop environment. We ran experiments for different input split sizes and different map slot sizes to validate our model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting MapReduce with Network-Aware Task Assignment

Running MapReduce in a shared cluster has become a recent trend to process large-scale data analytics applications while improving the cluster utilization. However, the network sharing among various applications can lead to constrained and heterogeneous network bandwidth available for MapReduce applications. This further increases the severity of network hotspots in racks, and makes existing ta...

متن کامل

Network-Aware Task Assignment for MapReduce Applications in Shared Clusters

Running MapReduce applications in shared clusters is becoming increasingly compelling to improve the cluster utilization. However, the network sharing across diverse applications can make the network bandwidth for MapReduce applications constrained and heterogeneous, which inevitably increases the severity of network hotspots in racks, and makes the existing task assignment policies that focus ...

متن کامل

Phurti: Application and Network-aware Flow Scheduling for Mapreduce

Traffic for a typical MapReduce job in a datacenter consists of multiple network flows. Traditionally, network resources have been allocated to optimize network-level metrics such as flow completion time or throughput. Some recent schemes propose using application-aware scheduling which can reduce the average job completion time. However, most of them treat the core network as a black box with ...

متن کامل

Real-time Scheduling of a Flexible Manufacturing System using a Two-phase Machine Learning Algorithm

The static and analytic scheduling approach is very difficult to follow and is not always applicable in real-time. Most of the scheduling algorithms are designed to be established in offline environment. However, we are challenged with three characteristics in real cases: First, problem data of jobs are not known in advance. Second, most of the shop’s parameters tend to be stochastic. Third, th...

متن کامل

A Mathematical Analysis on Linkage of a Network of Queues with Two Machines in a Flow Shop including Transportation Time

This paper represents linkage network of queues consisting of biserial and parallel servers linked to a common server in series with a flowshop scheduling system consisting of two machines. The significant transportation time of the jobs from one machine to another is also considered. Further, the completion time of jobs/customers (waiting time + service time) in the queue network is the set...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012